nlp_architect.data.sequential_tagging.CONLL2000¶
-
class
nlp_architect.data.sequential_tagging.
CONLL2000
(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]¶ CONLL 2000 POS/chunking task data set (numpy)
Parameters: - data_path (str) – directory containing CONLL2000 files
- sentence_length (int, optional) – number of time steps to embed the data. None value will not truncate vectors
- max_word_length (int, optional) – max word length in characters. None value will not truncate vectors
- extract_chars (boolean, optional) – Yield Char RNN features.
- lowercase (bool, optional) – lower case sentence words
-
__init__
(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(data_path[, sentence_length, …])Initialize self. Attributes
char_vocab
character Vocabulary chunk_vocab
chunk label Vocabulary dataset_files
pos_vocab
pos label Vocabulary test_set
get the test set train_set
get the train set word_vocab
word Vocabulary -
char_vocab
¶ character Vocabulary
-
chunk_vocab
¶ chunk label Vocabulary
-
dataset_files
= {'test': 'test.txt', 'train': 'train.txt'}¶
-
pos_vocab
¶ pos label Vocabulary
-
test_set
¶ get the test set
-
train_set
¶ get the train set
-
word_vocab
¶ word Vocabulary